Morpho: A decoupled MapReduce framework for elastic cloud computing

نویسندگان

  • Lu Lu
  • Xuanhua Shi
  • Hai Jin
  • Qiuyue Wang
  • Daxing Yuan
  • Song Wu
چکیده

MapReduce as a service enjoyswide adoption in commercial clouds today [3,23]. Butmost cloud providers just deploy native Hadoop [24] systems on their cloud platforms to provide MapReduce services without any adaptation to these virtualized environments [6,25]. In cloud environments, the basic executing units of data processing are virtual machines. Each user’s virtual cluster needs to deploy HDFS [26] every time when it is initialized, while the user’s input and output data should be transferred between the HDFS and external persistent data storage to ensure that the native Hadoop works properly. These costly data movements can lead to significant performance degradation of MapReduce jobs in the cloud. We present Morpho—a modified version of the Hadoop MapReduce framework, which decouples storage and computation into physical clusters and virtual clusters respectively. In Morpho, the map/reduce tasks are still running in VMs without corresponding ad-hoc HDFS deployments; instead, HDFS is deployed on the underlying physical machines. When MapReduce computation is performing, the map tasks can get data directly from physical machines without any extra data transfers. We design data location perception module to improve the cooperativity of the computation and storage layers, which means that the map tasks can intelligently fetch information about the network topology of physical machines and the VM placements. Additionally, Morpho also achieves high performance by two complementary strategies for data placement and VM placement, which can provide better map and reduce input locality. Furthermore, our data placement strategy can mitigate the resource contentions

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Bringing Elastic MapReduce to Scientific Clouds

The MapReduce programming model, proposed by Google, offers a simple and efficient way to perform distributed computation over large data sets. The Apache Hadoop framework is a free and open-source implementation of MapReduce. To simplify the usage of Hadoop, Amazon Web Services provides Elastic MapReduce, a web service that enables users to submit MapReduce jobs. Elastic MapReduce takes care o...

متن کامل

Resilin: Elastic MapReduce for Private and Community Clouds

The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google’s implementation is proprietary, MapReduce can be leveraged by anyone using the free and open source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service ...

متن کامل

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

An Effective Task Scheduling Framework for Cloud Computing using NSGA-II

Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Future Generation Comp. Syst.

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2014